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0 Clock failure recovery system. 

0 In a failure recovery system of an information 
processing system including a plurality of proces- 
sors and a diagnosis processor which can indepen- 
dently execute failure recovery for the processors, 
execution start of a failure recovery program includ- 



ing the step of recognizing a processor in which a 
clock failure occurs and executing logging of a clock 
failure event and a clock failure state is controlled in 
an interrupt processing program of the diagnosis 
processor, which is started upon interrupt. 
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The present invention relates to a failure recov- 
ery system in an infornnatlon processing system 
and, nnore particularly, to a clock failure recovery 
system. 

Conventionally, a clock failure in an information 
processing system is detected not as a clock fail- 
ure but as a failure in a logic portion caused by the 
clock failure. Therefore, when a clock failure oc- 
curs, failure recovery processing for a failure of a 
logic portion, which occurs under the influence of a 
clock failure, is executed. 

In the conventional clock failure recovery sys- 
tem described above, a clock failure is detected as 
a failure of a logic portion caused thereby, and the 
same failure recovery processing as that for a 
failure of the logic portion is executed. Therefore, it 
is difficult to discriminate whether a clock failure or 
a failure of a logic portion occurs. Especially, when 
an Intermittent clock failure as a most popular clock 
failure occurs, there is no reproducibility, and it Is 
very difficult to analyze a factor of the failure. 

In recent years, supercomputers and versatile 
large computers Increasingly have larger scales, 
and the number of hardware components constitut- 
ing a clock circuit Is increased accordingly. There- 
fore, a frequency of clock failure is increased, and 
a strong demand has arisen for clock failure recov- 
ery. 



Summary of the Invention 

It is therefore an object of the present invention 
to provide a clock failure recovery system which 
can discriminate a clock failure from a logic portion 
failure. 

It is another object of the present invention to 
provide a clock failure recovery system which can 
shorten a time required for analyzing a failure fac- 
tor. 

In order to achieve the above object of the 
present invention, there is provided a clock failure 
recovery system in an information processing sys- 
tem including a plurality of processors and a di- 
agnosis processor which can independently ex- 
ecute failure recovery for the plurality of proces- 
sors, comprising: a clock distributor for generating 
one or a plurality of clock signals used in the 
information processing system and distributing the 
clock signals to the processors; clock failure detec- 
tors for monitoring the clock signal output from the 
clock distributor and for, when an error occurs, 
generating an error signal in units of corresponding 
processors; and interrupt generating means, having 
error signal holding means for holding the error 
signal, for Interrupting the diagnosis processor in 
response to the error signal, wherein execution 
start of a failure recovery program including the 



step of recognizing a processor in which a clock 
failure occurs and executing logging of a clock 
failure event and a clock failure state is controlled 
in an interrupt processing program of the diagnosis 
5 processor, which is started upon interrupt 



Brief Description of the Drawings 

10 Fig. 1 is a block diagram showing an In- 

formation processing system to which an embodi- 
ment of a clock failure recovery system according 
to the present invention is applied; 

Frgs. 2 to 4 are flow charts showing failure 
15 recovery programs; and 

Fig. 5 is an actual flow chart for executing 
the programs shown in Figs. 2 to 4. 

Description of the Preferred Embodiments 

20 

Fig. 1 shows an Information processing system 
to which an embodiment of a clock failure recovery 
system of the present invention is applied. The 
information processing system comprises proces- 

25 sors 1 to 3, a clock oscillator 4, a clock distributor 
5, clock failure detectors 61 to 63, an interrupt 
signal generator 7 as an interrupt generating 
means, a diagnosis processor 8, clock signal lines 
101 to 103, 111 to 113. 121 to 123, 131 to 133, and 

30 171, and signal lines 141 to 143, 151 to 154, and 
161. 

The processor 1 is constituted by a plurality of 
packaging units 1-1 to 1-n. Each of the processors 
2 and 3 is similarly constituted by a plurality of 

35 packaging units as in the processor 1. The clock 
oscillator 4 supplies a clock signal to the clock 
distributor 5 through the clock signal line 171. The 
clock distributor 5 independently distributes the 
clock signal through the clock signal lines 111 to 

40 113, 121 to 123, and 131 to 133 in units of pack- 
aging units of the processors 1 to 3. 

The clock failure detectors 61 to 63 monitor the 
clock signals supplied from the clock distributor 5 
to the packaging units of the processors 1 to 3 

45 through the clock signal lines 101 to 103. When the 
detectors 61 to 63 detect an error, they inform it to 
the Interrupt signal generator 7 through the signal 
lines 141 to 143. 

When the interrupt signal generator 7 is in- 

50 formed of an error from the clock failure detectors 
61 to 63. it generates an interrupt signal, and 
outputs It to the diagnosis processor 8 through the 
signal line 161. The interrupt signal generator 7 has 
a status register 71 serving as an error signal 

55 holding means. The status register 71 holds data 
indicating a clock distribution system in which a 
failure occurs. 

The diagnosis processor 8 diagnoses the pro- 
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cessors 1 to 3, and comprises an interrupt process- 
ing controller 81 and a failure recovery controller 
82. The interrupt processing controller 81 recog- 
nizes that a clock error inten-upt occurs when the 
interrupt signal input through the signal line 161 
goes active, and informs it to the failure recovery 
controller 82. The failure recovery controller 82 
performs failure recovery control of the processors 
1 to 3, and can degrade the packaging units 1-1 to 
1-n and can disconnect the processor 1 from the 
information processing system. The same opera- 
tion as for the processor 1 is executed for the 
processors 2 and 3. The failure recovery controller 
82 can perform read access of the status register 
71 through the signal line 154, and logging of 
failure information. 

With the above arrangement, when an error 
occurs in a clock signal supplied from the clock 
distributor 5 to the processors 1 to 3 through the 
clock signal lines 111 to 113, 121 to 123, and 131 
to 133. the clock failure detectors 61 to 63 detect 
it, and supply a message indicating it to the inter- 
rupt signal generator 7 through the signal lines 1 41 
to 143. Upon reception of the message, the inter- 
rupt signal generator 7 generates an interrupt sig- 
nal to interrupt the diagnosis processor 8 through 
the signal line 161. The interrupt processing con- 
troller 81 of the diagnosis processor 8 reads the 
status register 71 to recognize a processor in 
which a clock failure occurs. The controller 81 then 
instructs the failure recovery processor 82 to start a 
failure recovery program. The failure recovery con- 
troller 82 executes the failure recovery program, 
thereby performing clock failure recovery process- 
ing, 

Rgs. 2 to 4 are flow charts showing the failure 
recovery programs. 

In a failure recovery program (failure recovery 

A) shown in Fig. 2, a clock failure event is logged 
in step 90. In step 91, the status register 71 is read 
to recognize a clock distribution system in which a 
clock failure occurs. In step 92, the content of the 
status register 71 is logged. Finally, in step 93, 
failure recovery of the entire information processing 
system is executed. 

In a failure recovery program (failure recovery 

B) shown in Fig. 3, after steps 90, 91, and 92 are 
executed similarly, failure recovery of only a pro- 
cessor in which a clock failure occurs is performed. 

In a failure recovery program (failure recovery 

C) shown in Fig. 4, after steps 90, 91, and 92 are 
executed similarly, the interrupt from a processor in 
which a clock failure occurs is inhibited for a pre- 
determined period of time. 

As described above, when a clock failure oc- 
curs in the information processing system of the 
present invention, occurrence of a clock failure is 
correctly recognized, proper failure recovery pro- 



cessing is executed, and a clock failure event and 
its location are logged, thereby presenting effective 
data for failure factor analysis. 

The failure recovery processing operations de- 

5 scribed above are executed as one flow chart 
shown in Fig. 5. In Fig. 5, the same step numbers 
designate the same processing steps as in the flow 
charts in Figs. 2 to 4. Steps 90 to 92 are similarly 
executed for any failure recovery. After step 92 is 

10 executed, the flow advances to step 110 to check if 
a failure occurs in a processor such as a system 
controller which influences the entire system. If 
YES in step 110. the flow advances to step 93 to 
execute the failure recovery A, so that failure re- 

75 covery processing of the entire system is execut- 
ed. If it is determined in step 110 that the failure 
occurs not in a processor such as a system con- 
troller which influences the entire system, the flow 
advances to step 111. In step 111, It is checked 

20 whether or not the failure is a fatal one which is 
closed in one processor, in other words, whether or 
not the failure can be recovered as a fatal failure in 
the processor of interest. If YES in step 111, the 
failure recovery B is executed, so that failure recov- 

25 ery processing of a processor in which a clock 
failure occurs is executed, if it is determined In 
step 1 1 1 that the failure is not a fatal failure which 
is closed in one processor. e.g„ an error of a 
display, the flow advances to step 95 to execute 

30 the failure recovery C. Thus, a failure interrupt from 
a processor in which a clock failure occurs is 
inhibited for a predetermined period of time. When 
such a failure frequently occurs, maintenance is 
performed in step 95 to cope with this failure. 

35 In this case, one of steps 93, 94, and 95 to be 

executed is determined depending on conditions of 
clock failures. The conditions of clock failures are 
that (1) a line is disconnected and no signal is 
input, (2) a clock cycle is disordered, (3) an object 

40 to be controlled falls outside a control range de- 
spite PLL control, and so on. If such a condition 
influences only one processor, processing in step 
94 or 95 is executed, and when a condition is 
related to the entire system, processing in step 93 

45 is executed. 

As described above, according to the present 
invention, a clock error is detected by a clock 
failure detector, and a clock failure event and a 
location of the clock failure are logged by a failure 

50 recovery program, thus discriminating a clock fail- 
ure or a logic portion failure. Therefore, a time 
required for analyzing a failure factor can be short- 
ened. 

56 

Claims 

1. A clock failure recovery system in an in- 
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formation processing system including a plurality of 
processors and a diagnosis processor which can 
independently execute failure recovery for said plu- 
rality of processors, comprising: 

a clock distributor for generating one or a plurality 5 
of clock signals used in said information processing 
system and distributing the clock signals to said 
processors; 

dock failure detectors for monitoring the clock sig- 
nal output from said clock distributor and for, when to 
an error occurs, generating an error signal in units 
of corresponding processors; and 
Interrupt generating means, having error signal 
holding means for holding the error signal, for 
interrupting said diagnosis processor in response 75 
to the error signal. 

wherein execution start of a failure recovery pro- 
gram including the step of recognizing a processor 
in which a clock failure occurs and executing log- 
ging of a clock failure event and a clock failure 20 
state is controlled in an interrupt processing pro- 
gram of said diagnosis processor, which is started 
upon interrupt. 
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